neural text generation
A Contrastive Framework for Neural Text Generation
Text generation is of great importance to many natural language processing applications. However, maximization-based decoding methods (e.g., beam search) of neural language models often lead to degenerate solutions---the generated text is unnatural and contains undesirable repetitions. Existing approaches introduce stochasticity via sampling or modify training objectives to decrease the probabilities of certain tokens (e.g., unlikelihood training). However, they often lead to solutions that lack coherence. In this work, we show that an underlying reason for model degeneration is the anisotropic distribution of token representations. We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method---contrastive search---to encourage diversity while maintaining coherence in the generated text. Extensive experiments and analyses on three benchmarks from two languages demonstrate that our proposed approach outperforms state-of-the-art text generation methods as evaluated by both human and automatic metrics.
Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
While large-scale neural language models, such as GPT2 and BART,have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in the human corpus (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probability of repetitive tokens and their previous repetitions in context. Through our quantitative experiments, we find that 1) Models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from synthetic repetitive data. Although our method is motivated by mitigating repetitions, our experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.
Appendix of ' Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation '
We calculate it for each sequence x and average over the whole corpus. When decoding auto-regressively, the probabilities of the repetitive sentence loops also have a self-reinforcement effect. As shown in Figure 2, the probability of the token'located' increases almost The work was conducted in Apple. Here we use the end token to split sentences for ease of experiments. We present the probability of the token'located' ( y-axis) as the number of historical repetitions Best viewed in color and zoomed in a desktop monitor.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Missouri > Buchanan County > Saint Joseph (0.04)
- (6 more...)
- Media > Film (1.00)
- Government (1.00)
- Leisure & Entertainment > Sports > Basketball (0.46)
A Contrastive Framework for Neural Text Generation
Text generation is of great importance to many natural language processing applications. However, maximization-based decoding methods (e.g., beam search) of neural language models often lead to degenerate solutions---the generated text is unnatural and contains undesirable repetitions. Existing approaches introduce stochasticity via sampling or modify training objectives to decrease the probabilities of certain tokens (e.g., unlikelihood training). However, they often lead to solutions that lack coherence. In this work, we show that an underlying reason for model degeneration is the anisotropic distribution of token representations.
Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
While large-scale neural language models, such as GPT2 and BART,have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in the human corpus (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probability of repetitive tokens and their previous repetitions in context. Through our quantitative experiments, we find that 1) Models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from synthetic repetitive data.
Generating Human-level Text with Contrastive Search in Transformers 🤗
In this blog, we introduce the current state-of-the-art decoding method, Contrastive Search, for neural text generation. Contrastive search is originally proposed in "A Contrastive Framework for Neural Text Generation" [1] ([Paper][Official Implementation]) at NeurIPS 2022. Moreover, in this follow-up work, "Contrastive Search Is What You Need For Neural Text Generation" [2] ([Paper] [Official Implementation]), the authors further demonstrate that contrastive search can generate human-level text using off-the-shelf language models across 16 languages. Contrastive Search is now available on transformers, both on PyTorch and TensorFlow. You can interact with the examples shown in this blog post using your framework of choice in this colab notebook, which is linked at the top.
Neural Text Generation with Part-of-Speech Guided Softmax
Neural text generation models are likely to suffer from the low-diversity problem. Various decoding strategies and training-based methods have been proposed to promote diversity only by exploiting contextual features, but rarely do they consider incorporating syntactic structure clues. In this work, we propose using linguistic annotation, i.e., part-of-speech (POS), to guide the text generation. In detail, we introduce POS Guided Softmax (POSG-Softmax) to explicitly model two posterior probabilities: (i) next-POS, and (ii) next-token from the vocabulary of the target POS. A POS guided sampling strategy is further proposed to address the low-diversity problem by enriching the diversity of POS. Extensive experiments and human evaluations demonstrate that, compared with existing state-of-the-art methods, our proposed methods can generate more diverse text while maintaining comparable quality.
- Asia > North Korea (0.07)
- Europe > Russia > Northwestern Federal District > Murmansk Oblast > Murmansk (0.04)
- Asia > South Korea (0.04)